430 research outputs found

    Qualitative Effects of Knowledge Rules in Probabilistic Data Integration

    Get PDF
    One of the problems in data integration is data overlap: the fact that different data sources have data on the same real world entities. Much development time in data integration projects is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates from the integration result or solve other semantic conflicts, but it proofs impossible to get rid of all semantic problems in data integration. An often-used rule of thumb states that about 90% of the development effort is devoted to solving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that stores any remaining semantic uncertainty and conflicts in a probabilistic database enabling it to already be meaningfully used. The main development effort in our approach is devoted to defining and tuning knowledge rules and thresholds. Rules and thresholds directly impact the size and quality of the integration result. We measure integration quality indirectly by measuring the quality of answers to queries on the integrated data set in an information retrieval-like way. The main contribution of this report is an experimental investigation of the effects and sensitivity of rule definition and threshold tuning on the integration quality. This proves that our approach indeed reduces development effort — and not merely shifts the effort to rule definition and threshold tuning — by showing that setting rough safe thresholds and defining only a few rules suffices to produce a ‘good enough’ integration that can be meaningfully used

    Quality Measures in Uncertain Data Management

    Get PDF
    Many applications deal with data that is uncertain. Some examples are applications dealing with sensor information, data integration applications and healthcare applications. Instead of these applications having to deal with the uncertainty, it should be the responsibility of the DBMS to manage all data including uncertain data. Several projects do research on this topic. In this paper, we introduce four measures to be used to assess and compare important characteristics of data and systems

    User Feedback in Probabilistic XML

    Get PDF
    Data integration is a challenging problem in many application areas. Approaches mostly attempt to resolve semantic uncertainty and conflicts between information sources as part of the data integration process. In some application areas, this is impractical or even prohibitive, for example, in an ambient environment where devices on an ad hoc basis have to exchange information autonomously. We have proposed a probabilistic XML approach that allows data integration without user involvement by storing semantic uncertainty and conflicts in the integrated XML data. As a\ud consequence, the integrated information source represents\ud all possible appearances of objects in the real world, the\ud so-called possible worlds.\ud \ud In this paper, we show how user feedback on query results\ud can resolve semantic uncertainty and conflicts in the\ud integrated data. Hence, user involvement is effectively postponed to query time, when a user is already interacting actively with the system. The technique relates positive and\ud negative statements on query answers to the possible worlds\ud of the information source thereby either reinforcing, penalizing, or eliminating possible worlds. We show that after repeated user feedback, an integrated information source better resembles the real world and may converge towards a non-probabilistic information source

    Short and random: Modelling the effects of (proto-)neural elongations

    Get PDF
    To understand how neurons and nervous systems first evolved, we need an account of the origins of neural elongations: Why did neural elongations (axons and dendrites) first originate, such that they could become the central component of both neurons and nervous systems? Two contrasting conceptual accounts provide different answers to this question. Braitenberg's vehicles provide the iconic illustration of the dominant input-output (IO) view. Here the basic role of neural elongations is to connect sensors to effectors, both situated at different positions within the body. For this function, neural elongations are thought of as comparatively long and specific connections, which require an articulated body involving substantial developmental processes to build. Internal coordination (IC) models stress a different function for early nervous systems. Here the coordination of activity across extended parts of a multicellular body is held central, in particular for the contractions of (muscle) tissue. An IC perspective allows the hypothesis that the earliest proto-neural elongations could have been functional even when they were initially simple short and random connections, as long as they enhanced the patterning of contractile activity across a multicellular surface. The present computational study provides a proof of concept that such short and random neural elongations can play this role. While an excitable epithelium can generate basic forms of patterning for small body-configurations, adding elongations allows such patterning to scale up to larger bodies. This result supports a new, more gradual evolutionary route towards the origins of the very first full neurons and nervous systems.Comment: 12 pages, 5 figures, Keywords: early nervous systems, neural elongations, nervous system evolution, computational modelling, internal coordinatio

    How Noisy Data Affects Geometric Semantic Genetic Programming

    Full text link
    Noise is a consequence of acquiring and pre-processing data from the environment, and shows fluctuations from different sources---e.g., from sensors, signal processing technology or even human error. As a machine learning technique, Genetic Programming (GP) is not immune to this problem, which the field has frequently addressed. Recently, Geometric Semantic Genetic Programming (GSGP), a semantic-aware branch of GP, has shown robustness and high generalization capability. Researchers believe these characteristics may be associated with a lower sensibility to noisy data. However, there is no systematic study on this matter. This paper performs a deep analysis of the GSGP performance over the presence of noise. Using 15 synthetic datasets where noise can be controlled, we added different ratios of noise to the data and compared the results obtained with those of a canonical GP. The results show that, as we increase the percentage of noisy instances, the generalization performance degradation is more pronounced in GSGP than GP. However, in general, GSGP is more robust to noise than GP in the presence of up to 10% of noise, and presents no statistical difference for values higher than that in the test bed.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation Conference (GECCO 2017), Berlin, German

    Report on the First VLDB Workshop on Management of Uncertain Data (MUD 2007)

    Get PDF
    On Monday September 24th, we organized the first international VLDB workshop on Management of Uncertain Data [dKvKD07]. The idea of this workshop arose a year earlier at the Twente Data Management Workshop on Uncertainty in Databases [dKvK06]. The TDM is a bi-annual workshop organized by the Database group of the University of Twente, for which each time a different topic is chosen. The participants of TDM 2006 were enthusiastic about the topic "Uncertainty in Databases" and strongly expressed the wish for a follow-up co-located with an international conference. To fulfill this wish, we organized the MUD-workshop at VLDB

    Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS

    Get PDF
    Trio is a new kind of database system that supports data, uncertainty, and lineage in a fully integrated manner. The first Trio prototype, dubbed Trio-One, is built on top of a conventional DBMS using data and query translation techniques together with a small number of stored procedures. This paper describes Trio-One's translation scheme and system architecture, showing how it efficiently and easily supports the Trio data model and query language

    Efficient Equilibria in Polymatrix Coordination Games

    Get PDF
    We consider polymatrix coordination games with individual preferences where every player corresponds to a node in a graph who plays with each neighbor a separate bimatrix game with non-negative symmetric payoffs. In this paper, we study α\alpha-approximate kk-equilibria of these games, i.e., outcomes where no group of at most kk players can deviate such that each member increases his payoff by at least a factor α\alpha. We prove that for α2\alpha \ge 2 these games have the finite coalitional improvement property (and thus α\alpha-approximate kk-equilibria exist), while for α<2\alpha < 2 this property does not hold. Further, we derive an almost tight bound of 2α(n1)/(k1)2\alpha(n-1)/(k-1) on the price of anarchy, where nn is the number of players; in particular, it scales from unbounded for pure Nash equilibria (k=1)k = 1) to 2α2\alpha for strong equilibria (k=nk = n). We also settle the complexity of several problems related to the verification and existence of these equilibria. Finally, we investigate natural means to reduce the inefficiency of Nash equilibria. Most promisingly, we show that by fixing the strategies of kk players the price of anarchy can be reduced to n/kn/k (and this bound is tight)
    corecore